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This  paper  describes  a  framework  for  automated  classification  and  labeling  of  patterns  in  electroencephalographic  (EEG)  and 
magnetoencephalographic  (MEG)  data.  We  describe  recent  progress  on  four  goals:  1)  specification  of  rules  and  concepts  that 
capture  expert  knowledge  of  event-related  potentials  (ERP)  patterns  in  visual  word  recognition;  2)  implementation  of  rules  in 
an  automated  data  processing  and  labeling  stream;  3)  data  mining  techniques  that  lead  to  refinement  of  rules;  and  4)  iterative 
steps  towards  system  evaluation  and  optimization.  This  process  combines  top-down,  or  knowledge-driven,  methods  with  bottom- 
up,  or  data-driven,  methods.  As  illustrated  here,  these  methods  are  complementary  and  can  lead  to  development  of  tools  for 
pattern  classification  and  labeling  that  are  robust  and  conceptually  transparent  to  researchers.  The  present  application  focuses  on 
patterns  in  averaged  EEG  (ERP)  data.  We  also  describe  efforts  to  extend  our  methods  to  represent  patterns  in  MEG  data,  as  well  as 
EM  patterns  in  source  (anatomical)  space.  The  broader  aim  of  this  work  is  to  design  an  ontology-based  system  to  support  cross¬ 
laboratory,  cross-paradigm,  and  cross-modal  integration  of  brain  functional  data.  Tools  developed  for  this  project  are  implemented 
in  MATLAB  and  are  freely  available  on  request. 
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1.  INTRODUCTION 

The  complexity  of  brain  electromagnetic  (EM)  data  has  led 
to  a  variety  of  processes  for  EM  pattern  classification  and  la¬ 
beling  over  the  past  several  decades.  The  absence  of  a  com¬ 
mon  framework  may  account  for  the  dearth  of  statistical 
metaanalyses  in  this  field.  Such  cross-lab,  cross-paradigm  re¬ 
views  are  critical  for  establishing  basic  findings  in  science. 
FFowever,  reviews  in  the  EM  literature  tend  to  be  infor¬ 
mal,  rather  than  statistical:  it  is  difficult  to  generalize  across 
datasets  that  are  classified  and  labeled  in  different  ways. 

To  address  this  problem,  we  have  designed  a  framework 
to  support  automated  classification  and  labeling  of  patterns 
in  electroencephalographic  (EEG)  and  magnetoencephalo¬ 
graphic  (MEG)  data.  In  the  present  paper,  we  describe  the 
framework  architecture  and  present  an  application  to  aver¬ 
aged  EEG  (event- related  potentials,  or  ERP)  data  collected 
in  a  visual  word  recognition  paradigm.  Results  from  this 
study  illustrate  the  importance  of  combining  top-down  and 


bottom-up  approaches.  In  addition,  they  suggest  the  need 
for  ongoing  system  evaluation  to  diagnose  potential  sources 
of  error  in  component  analysis,  classification,  and  labeling. 
We  conclude  by  discussing  alternative  analysis  pathways  and 
ways  to  improve  efficiency  of  implementation  and  testing  of 
alternative  methods.  It  is  our  hope  that  this  framework  can 
support  increased  collaboration  and  integration  of  ERP  re¬ 
sults  across  laboratories  and  across  study  paradigms. 

7.7.  Classification  of  ERPs 

A  standard  technique  for  analysis  of  EEG  data  involves  aver¬ 
aging  across  segments  of  data  (trials),  time-locking  to  stim¬ 
ulus  or  response  events.  The  resulting  measures  are  charac¬ 
terized  by  a  sequence  of  positive  and  negative  deflections  dis¬ 
tributed  across  time  and  space  (scalp  locations).  In  princi¬ 
ple,  activity  that  is  not  event- related  will  tend  towards  zero 
as  the  number  of  averaged  trials  increases.  In  this  way,  ERPs 
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provide  increased  signal-to-noise,  and  thus  increased  sen¬ 
sitivity,  to  functional  (e.g.,  task)  manipulations.  Signal  av¬ 
eraging  assumes  that  the  brain  signals  of  interest  are  time- 
locked  to  (or  “evoked  by”)  the  events  of  interest.  As  illus¬ 
trated  in  recent  work  on  induced  (nontime-locked)  versus 
evoked  (time-locked)  EEG  activity,  this  assumption  does  not 
always  hold  ([1,  2]). 

In  the  past  several  decades,  researchers  have  described 
several  dozen  spatiotemporal  ERP  patterns  (or  components ), 
which  are  thought  to  index  a  variety  of  neuropsychologi¬ 
cal  processes.  Some  patterns  are  observed  across  a  range  of 
experimental  contexts,  reflecting  domain -general  processes, 
such  as  memory,  decision-making,  and  attention.  Other  pat¬ 
terns  are  observed  in  response  to  specific  types  of  stimuli, 
reflecting  human  expertise  in  domains  such  as  mathematics, 
face  recognition,  and  reading  comprehension  (for  reviews  see 
[3, 4] ).  Previous  investigations  of  these  patterns  have  demon¬ 
strated  the  effectiveness  of  ERP  methods  for  addressing  basic 
questions  in  nearly  every  area  of  psychology. 

Given  the  success  of  this  methodology,  ERPs  are  likely 
to  remain  at  the  forefront  of  research  in  clinical  and  cog¬ 
nitive  neuroscience,  even  as  newer  methods  for  EEG  and 
MEG  analyses  are  developed  as  alternatives  to  signal  averag¬ 
ing  (e.g.,  [1,2,  5-7]). 

At  the  same  time,  ERP  methods  face  some  important 
challenges.  A  key  challenge  is  to  identify  standardized  meth¬ 
ods  for  measure  generation,  as  well  as  objective  and  reli¬ 
able  methods  for  identification  and  labeling  of  ERP  com¬ 
ponents.  Traditionally,  researchers  have  characterized  ERP 
components  in  respect  to  both  physiological  (spatial,  tem¬ 
poral)  and  functional  criteria  [8,  9].  Physiological  criteria  in¬ 
clude  latency  and  scalp  distribution,  or  topography.  For  ex¬ 
ample,  as  illustrated  in  Figure  1,  the  visual  “PI 00  compo¬ 
nent”  is  characterized  by  a  positive  deflection  that  peaks  at 
~  100  milliseconds  after  onset  of  a  visual  stimulus  (A)  and  is 
maximal  over  occipital  electrodes,  reflecting  activity  in  visual 
cortex  (B). 

Despite  general  agreement  on  criteria  for  ERP  compo¬ 
nent  identification  [9],  in  practice  such  patterns  can  be  hard 
to  identify,  particularly  in  individual  subjects.  This  difficulty 
is  due  in  part  to  the  superposition  of  patterns  generated  by 
multiple  brain  regions  at  each  time  point  [10],  leading  to 
complex  spatial  patterns  that  reflect  the  mixing  of  under¬ 
lying  patterns.  Given  this  complexity,  ERP  researchers  have 
adopted  a  variety  of  solutions  for  scalp  topographic  analysis 
(e.g.,  [11,  12]).  It  can  therefore  be  difficult  to  compare  re¬ 
sults  from  different  studies,  even  when  the  same  experimen¬ 
tal  stimuli  and  task  are  used. 

Similarly,  researchers  use  a  variety  of  methods  for  de¬ 
scribing  temporal  patterns  in  ERP  data  [13].  For  example, 
early  components,  such  as  the  PI 00,  tend  to  be  character¬ 
ized  by  their  peak  latency,  while  the  time  course  of  later  com¬ 
ponents,  such  as  the  N400  or  P300,  is  typically  captured  by 
averaging  over  time  “windows”  (e.g.,  300-500  milliseconds). 
The  latency  of  other  components,  such  as  the  N400,  has  been 
quantified  in  a  variety  of  ways.  Finally,  there  is  variability 
in  how  functional  information  (e.g.,  subject-,  stimulus-,  or 
task-specific  variables)  is  used  in  ERP  pattern  classification. 
Some  patterns,  such  as  the  PI 00,  are  easily  observed  as  large 


deflections  in  the  raw  ERP  waveforms.  Other  patterns,  such 
as  the  mismatch  negativity  are  more  reliably  seen  in  differ¬ 
ence  measures,  calculated  by  subtracting  ERP  amplitude  in 
one  condition  from  the  ERP  amplitude  in  a  contrasting  con¬ 
dition.  This  inconsistency  may  lead  to  confusion,  particularly 
when  the  same  label  is  used  to  refer  to  two  different  measures, 
as  is  often  the  case. 

7.2.  Outline  of  paper 

In  summary,  the  complexity  of  ERP  data  has  led  to  multi¬ 
ple  processes  for  measure  generation  and  pattern  classifica¬ 
tion  that  can  vary  considerably  across  different  experiment 
paradigms  and  across  research  laboratories.  Ultimately,  this 
limits  the  ability  both  to  replicate  prior  results  and  to  gener¬ 
alize  across  findings  to  achieve  high-level  interpretations  of 
ERP  patterns. 

In  light  of  these  challenges,  the  goal  of  this  paper  is 
to  describe  a  framework  for  automated  classification  and 
labeling  of  ERP  patterns.  The  framework  presented  here 
comprises  both  top-down  (knowledge -driven)  and  bottom- 
up  (data-driven)  methods  for  ERP  pattern  analysis,  classi¬ 
fication,  and  labeling.  Following,  we  describe  this  frame¬ 
work  in  detail  (Section  2)  and  present  an  application  to  pat¬ 
terns  in  ERP  data  from  a  visual  word  processing  paradigm 
(Section  3).  Section  4  describes  approaches  to  system  eval¬ 
uation.  Section  5  describes  data  mining  for  refinement  of 
expert-driven  (top-down)  methods.  In  Section  6,  we  draw 
some  general  conclusions  and  discuss  extensions  of  our 
framework  for  representation  of  patterns  in  source  space, 
and  ontology  development  to  support  cross-paradigm, 
cross -laboratory,  and  cross-modal  integration  of  results  in 
EM  research. 

2.  PATTERN  CLASSIFICATION  FRAMEWORK 

As  illustrated  in  Figure  2,  our  framework  comprises  five  main 
processes. 

(i)  Knowledge  engineering.  Known  ERP  patterns  are  cata¬ 
loged  (1).  High-level  rules  and  concepts  are  described 
for  each  pattern  (2). 

(ii)  Pattern  analysis  and  measure  generation.  Analysis 
methods  are  selected  and  applied  to  ERP  data  (3).  The 
goal  is  transformation  of  continuous  spatiotemporal 
data  into  discrete  patterns  for  labeling.  Statistics  are 
generated  (4)  to  capture  the  rules  and  concepts  identi¬ 
fied  in  (2). 

(iii)  Data  mining.  Unsupervised  clustering  (7)  and  super¬ 
vised  learning  (8)  are  used  to  explore  how  measures 
cluster,  and  how  these  clusters  may  be  used  to  identify 
and  label  patterns  using  rules  derived  independently  of 
expert  knowledge. 

(iv)  Operationalization  and  application  of  rules.  Rules  are 
operationalized  by  combining  metrics  in  (4)  with  prior 
knowledge  (2).  Data  mining  results  (7-8)  maybe  used 
to  validate  and  refine  the  rules.  Rules  are  applied  to 
data,  using  an  automated  labeling  process  (6)  detailed 
below. 
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Figure  1:  (a)  Time  course  of  P100  pattern,  plotted  at  left  occipital  electrode,  Ol.  Time  is  plotted  on  the  x-axis  (0-700  milliseconds);  each 
vertical  hash  mark  represents  100  milliseconds.  Amplitude  is  plotted  on  the  y-axis  (scale,  ±4^V).  The  dark  vertical  line  marks  the  time  of 
peak  amplitude  (~120  milliseconds),  (b)  Scalp  topography  of  the  P100  pattern,  plotted  at  the  time  of  peak  amplitude.  Red,  positive.  Blue, 
negative. 


Following,  we  describe  how  these  processes  have  been  im¬ 
plemented  in  a  series  of  MATLAB  procedures.  We  then  re¬ 
port  results  from  the  application  of  this  process  to  data  from 
a  visual  word  processing  experiment.  Results  are  evaluated 
against  a  “gold  standard”  that  consists  of  expert  judgments 
regarding  the  presence  or  absence  of  patterns,  and  their  pro¬ 
totypicality,  for  each  of  144  observations  (36  subjects  X4  ex¬ 
periment  conditions). 

2. 7.  Knowledge  engineering  (process  1, 2) 

The  goal  of  knowledge  engineering  is  to  identify  concepts 
that  have  been  documented  for  a  particular  research  domain. 
Based  on  prior  research  on  visual  word  processing  we  have 
tentatively  identified  eight  spatiotemporal  patterns  that  are 
commonly  observed  from  -100  to  -700  milliseconds  after 
presentation  of  a  visual  word  stimulus,  including  the  PI 00, 
N100,  late  Nl/N2b,  N3,  Plr,  MFN,  N400,  and  P300.  Space 
limitations  preclude  a  detailed  discussion  of  each  pattern  (see 
reviews  in  [3,  4]).  The  left  temporal  N3  and  medial  frontal 
negativity  (MFN)  components  are  less  well  known,  but  have 
been  described  in  several  high- density  ERP  studies  of  visual 
word  processing  (e.g.,  [14-16]).  The  Plr  [17]  has  also  been 
referred  to  as  a  posterior  P2  [18].  The  late  Nl/N2b  has  var¬ 
iously  been  referred  to  as  an  N2,  an  N170,  and  a  recogni¬ 
tion  potential  (see  [15]  for  discussion  and  references).  It  is 
not  clear  that  the  late  N1/N2  represents  a  component  that  is 
functionally  distinct  from  the  N1  and  N3,  though  it  some¬ 
times  emerges  in  tPCA  results  as  a  distinct  spatiotemporal 
pattern  (e.g.,  see  Section  3).  These  eight  patterns  reflect  a 
working  taxonomy  of  ERP  in  research  on  visual  word  pro¬ 
cessing  between  -60-700  milliseconds.  Application  of  the 
present  framework  to  large  numbers  of  datasets  collected 
across  a  range  of  paradigms,  and  across  different  ERP  re¬ 
search  labs,  would  contribute  to  the  refinement  of  this  tax¬ 
onomy. 

A  note  of  caution  is  in  order,  concerning  the  labels  for 
scalp  regions  of  interest  (ROIs).  By  convention,  areas  of  the 


Table  1:  Spatial  and  temporal  concepts  used  to  define  the  eight  tar¬ 
get  patterns.  Regions  of  interest  (ROIs)  are  defined  in  Appendix  A. 


Pattern 

Window 

ROI 

P100 

60-150 

occipital 

N100 

151-230 

occipital 

N2 

231-300 

post-temporal 

Plr 

250-400 

parietal 

N3 

250-400 

left  anterior 

MFN 

250-450 

frontal 

N4 

350-550 

parietal 

P300 

401-700 

parietal 

scalp  are  associated  with  anatomical  labels,  such  as  “occipi¬ 
tal,”  “parietal,”  “temporal,”  and  “frontal”  (see  Table  1).  It  is 
well  known,  however,  that  a  positive  or  negative  deflection 
over  a  particular  scalp  ROI  is  not  necessarily  generated  in 
cortex  directly  below  the  measured  data.  ERP  patterns  can 
reflect  sources  tangential  to  the  scalp  surface.  In  this  instance, 
the  positive  and  negative  fields  may  be  maximal  over  remote 
regions  of  the  scalp,  reflecting  a  dipolar  scalp  distribution 
(e.g.,  with  a  positive  maximum  over  frontal  scalp  regions, 
and  a  negative  maximum  over  temporal  scalp  regions).  Thus, 
the  ROI  labels  should  not  be  interpreted  as  literal  references 
to  brain  regions.  The  ROI  clusters  used  in  the  present  study 
are  shown  in  Appendix  A. 

2.2.  Data  summary 

Prior  to  analysis,  ERP  data  consist  of  complex  waveforms 
(time  series),  measured  at  multiple  electrode  sites.  To  sim¬ 
plify  analysis  and  interpretation  of  these  data,  a  standard 
practice  is  to  transform  the  ERPs  into  discrete  patterns.  Tra¬ 
ditional  methods  for  data  summary  include  identification  of 
peak  latency  within  a  specified  time  window  (“peak  picking”) 
and  computing  the  mean  amplitude  over  a  time  window 
for  each  electrode  (“windowed  analysis”),  or  averaged  over 
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Figure  2:  Pattern  classification  and  labeling  scheme.  Knowledge  engineering  (processes  1,  2)  includes  “top-down”  specification  of  ERP  con¬ 
cepts  and  rules,  formulated  by  domain  experts.  Component  analysis  and  measure  generation  (processes  3,  4)  yield  summary  metrics  that  are 
used  for  pattern  classification  and  labeling.  Implementation  and  operationalization  of  pattern  rules  (processes  5,  6)  are  detailed  in  Section  2. 
Data  mining  (processes  7,  8)  includes  “bottom-up”  or  data-driven  methods  for  clustering  and  discovery  of  pattern  rules  (Section  5).  System 
evaluation  is  detailed  in  Section  4. 


electrode  clusters  (regions  of  interest — ROIs).  An  alternative 
method  is  principal  components  analysis  (PCA),  which  de¬ 
composes  the  data  into  “latent”  patterns,  or  factors.  The  fol¬ 
lowing  subsection  describes  this  method  in  detail,  and  ex¬ 
plains  the  utility  of  PCA  for  automated  pattern  classification. 

2.2.  7.  Temporal  PCA  methods  (process  3) 

PCA  belongs  to  a  class  of  factor- analytic  procedures,  which 
use  eigenvalue  decomposition  to  extract  linear  combinations 
of  variables  (latent  “factors”)  in  such  a  way  as  to  account 
for  patterns  of  covariance  in  the  data  parsimoniously,  that  is, 
with  the  fewest  factors.  Mathematically,  the  goal  of  PCA  is  to 
take  intercorrelated  variables  (xi, . . .  ,x„)  and  combine  them 
such  that  the  tranformed  data,  the  “principal  components” 
(PC),  are  linear  combinations  of  x,  weighted  to  maximize  the 
amount  of  variance  captured  by  each  eigenvector  (vz  ): 

PCi  =  vnxi  +  V12X2  +  ■  ■  ■  +  vi„xn.  (1) 

In  this  way,  the  original  set  of  variables  (xi,. . .  ,x„)  is  “pro¬ 
jected”  into  a  new  data  space,  where  the  dimensions  of  this 
new  space  are  captured  by  a  small  number  of  latent  factors 
(the  eigenvectors). 


In  ERP  data,  the  variables  (xi,. . .  ,xn)  are  the  microvolt 
readings  either  at  consecutive  time  points  (temporal  PCA) 
or  at  each  electrode  (spatial  PCA).  The  major  source  of  co- 
variance  isassumed  to  be  the  ERP  components,  characteristic 
features  of  the  wave  form  that  are  spread  across  multiple  time 
points  and  multiple  electrodes.  Ideally,  each  latent  factor  cor¬ 
responds  to  a  separate  ERP  component,  providing  a  statis¬ 
tical  decomposition  of  the  brain  electrical  patterns  that  are 
superposed  in  the  scalp -recorded  data.  To  achieve  this  ideal 
factor-to-pattern  mapping,  the  factors  may  be  “rotated”  so 
that  the  variance  associated  with  the  original  variables  (time- 
points)  is  redistributed  across  the  factors  in  such  a  way  that 
maximizes  “simple  structure,”  that  is,  that  achieves  a  simple 
and  transparent  mapping  from  variables  to  factors.  (See  [19] 
for  a  review  of  PCA  and  related  factor- analytic  methods  for 
ERP  data  decomposition.) 

In  the  present  application,  we  used  temporal  PCA  (tPCA) 
as  implemented  in  the  Dien  PCA  Toolbox  [20].  In  temporal 
PCA,  the  data  are  organized  with  the  variables  correspond¬ 
ing  to  time  points  and  observations  corresponding  to  the  dif¬ 
ferent  waveforms  in  the  dataset.  The  waveforms  vary  across 
subjects,  electrodes,  and  experimental  conditions.  Thus,  sub¬ 
ject,  spatial,  and  task  variance  are  collectively  responsible  for 
covariance  among  the  temporal  variables.  The  data  matrix 
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is  then  self- multiplied  and  mean- corrected  to  produce  a  co¬ 
variance  matrix.  The  covariance  matrix  is  subjected  to  eigen¬ 
value  decomposition,  and  the  resulting  nonnoise  factors  are 
rotated  using  Promax  to  obtain  a  more  transparent  relation¬ 
ship  between  the  PCA  factors  and  the  latent  variables  of  in¬ 
terest  (i.e.,  ERP  components). 

After  transformation  of  the  ERP  data  into  factor  space, 
the  data  are  projected  back  into  the  original  data  space,  by 
multiplying  factor  scores  by  factor  loadings  and  by  the  stan¬ 
dard  deviation  at  each  timepoint  (see  the  appendix  in  [21]). 
In  this  way,  it  is  possible  to  visualize  and  extract  information 
about  the  strength  of  the  pattern  at  each  electrode,  to  deter¬ 
mine  the  spatial  distribution  of  the  pattern  for  a  given  subject 
and  experiment  condition.  Visualizing  the  spatial  projection 
of  each  factor  in  this  way  is  useful  in  interpreting  tPCA  re¬ 
sults  (e.g.,  see  Figure  3(b)). 

For  our  initial  attempts  to  automate  data  description 
and  classification,  tPCA  offered  several  advantages  over  tra¬ 
ditional  methods.  First,  tPCA  is  able  to  separate  overlap¬ 
ping  spatiotemporal  patterns.  Second,  tPCA  automatically 
extracts  a  discrete  set  of  temporal  patterns.  Third,  when  im¬ 
plemented  and  graphed  appropriately,  tPCA  results  are  eas¬ 
ily  interpreted  with  respect  to  previous  findings,  as  illus¬ 
trated  below.  tPCA  is  therefore  easily  incorporated  in  an 
automated  process  for  ERP  pattern  extraction  and  classifi¬ 
cation.  In  the  final  section,  we  address  some  limitations  of 
tPCA  as  a  method  of  ERP  pattern  analysis. 

2.2.2.  Measure  generation  (process  4) 

For  each  tPCA  factor,  we  extracted  32  summary  metrics  that 
characterize  spatial,  temporal,  and  functional  dimensions  of 
the  data.  The  full  set  of  metrics,  along  with  their  definitions, 
is  listed  in  Appendix  C.  Note  that  our  expert- defined  rules, 
which  were  used  for  the  tPCA  autolabeling  process,  mainly 
involved  two  metrics  (see  Section  2.2.3  for  details):  In-mean 
(ROI)  and  Tl-max.  In-mean  (ROI)  represents  the  amplitude 
over  a  region-of-interest  (ROI),  averaged  over  electrode  clus¬ 
ters  for  each  latent  factor  at  the  time  of  peak  latency,  after  the 
factor  has  been  projected  back  into  channel  space.  Tl-max 
is  the  peak  latency  and  is  measured  on  the  factor  loadings, 
which  are  sign- invariant. 

Although  these  two  metrics  intuitively  capture  the  spa¬ 
tial  and  temporal  dimensions  of  the  ERP  data  that  are  most 
salient  to  ERP  researchers,  our  prior  data  mining  results  sug¬ 
gested  that  additional  metrics  might  improve  the  tPCA  au¬ 
tolabeling  results  [22,  23].  In  particular,  some  failures  in  the 
autolabeling  process  (i.e.,  cases  where  the  modal  factor  for 
a  given  pattern  did  not  show  a  match  to  the  rule  in  a  given 
condition,  for  a  given  subject)  were  due  to  component  over¬ 
lap  that  remained  even  after  tPCA.  For  example,  in  one  of 
our  four  pilot  datasets  [23],  the  P100  pattern  was  partially 
captured  by  a  factor  corresponding  to  the  N100.  For  some 
subjects,  most  of  the  PI  00  was  in  fact  captured  by  this  “N100 
factor.”  The  factor  showed  a  slow  negativity,  beginning  before 
the  stimulus  onset,  and  the  PI 00  appeared  as  a  positive  going 
deflection  that  was  superposed  on  this  sustained  negativity. 
However,  because  the  rule  specified  that  the  mean  amplitude 


over  the  occipital  electrodes  should  be  positive,  the  factor  did 
not  meet  the  PI 00  rule  criteria. 

To  address  this  issue,  we  implemented  onset  and  offset 
metrics.  Each  onset  latency  was  estimated  as  the  midpoint  of 
four  consecutive  sliding  windows  in  which  corresponding  t- 
tests  (threshold,  P  =  .05)  indicated  that  the  means  of  their  re¬ 
spective  windowed  signals  diverged  significantly  from  a  base¬ 
line  value,  typically  zero.  The  subsequent  offset  was  the  tem¬ 
poral  midpoint  at  which  the  four  consecutive  t-tests  showed 
their  windowed  signal  means  returned  to  baseline.  The  pro¬ 
cedure  is  implemented  as  described  in  [24]. 

Using  the  onset  latency  to  determine  a  “baseline”  (0- 
point  or  onset)  for  each  pattern,  we  then  computed  peak-to- 
baseline  and  baseline-to-peak  metrics  to  capture  phasic  de¬ 
flections  that  could  be  confused  with  slow  potentials.  The 
baseline  intensity  was  computed  as  the  signal  mean  within 
an  interval  centered  on  component  onset.  We  predicted  that 
data  mining  results  would  incorporate  these  measures  to 
yield  improved  accuracy  in  the  labeling  process. 

In  addition,  we  added  metrics  to  capture  variations  in 
amplitude  due  to  experimental  variables.  Four  measures 
were  computed:  Pseudo-Known  (difference  in  response  to 
nonwords  versus  words),  RareMisses-RareHits  (difference  in 
response  to  unknown  rare  words  versus  words  that  we  cor¬ 
rectly  recognized),  RareHits- Known  (difference  in  response 
to  rare  versus  low-frequency  words),  and  Pseudo-RareMisses 
(difference  in  nonwords  versus  missed  rare  words).  Because 
prior  research  has  shown  that  semantic  processing  can  affect 
the  N2,  N3,  MFN,  N4,  and  P3  patterns,  we  predicted  that  the 
data  mining  procedures  would  identify  one  or  more  of  these 
metrics  as  important  for  pattern  classification. 

2.2.3.  Rule  operationalization  (process  5) 

Rules  for  each  ERP  pattern  were  formulated  initially  based  on 
results  from  prior  literature  and  were  operationalized  using 
metrics  defined  in  Process  4  (Section  2.2.2).  After  application 
of  the  initial  rules  to  test  data,  we  evaluated  the  results  against 
a  “Gold  Standard”  (see  Section  4  for  details)  and  modified 
the  pattern  rules  to  improve  accuracy.  For  example,  after  ini¬ 
tial  testing,  the  visual  “PI 00”  pattern  (PlOOv)  was  defined  as 
follows:  for  any  n,  FA„  =  PlOOv  if  and  only  if 

(i)  80  ms  <  Tl-max  (FA„)  <150  milliseconds, 

(ii)  In-mean(ROI)  >  0, 

(iii)  EVENT  (FA„)  =  stimon, 

(iv)  MODALITY  (EVENT)  =  visual, 

where  FA„  is  defined  as  the  nth  tPCA  factor,  and  PlOOv  is 
the  visual-evoked  P100  (“v”  stands  for  “visual”).  Tl-max  is 
the  time  of  peak  amplitude,  In-mean(ROI)  is  the  mean  am¬ 
plitude  over  the  region-of-interest  (ROI),  and  ROI  for  PlOOv 
is  specified  as  “occipital”  (i.e.,  mean  intensity  over  occipital 
electrodes).  “Stimon”  refers  to  stimulus  onset,  which  is  the 
event  that  is  used  for  time-locking  single  trials  to  derive  the 
ERP.  “MODALITY”  refers  to  the  stimulus  modality  (e.g.,  vi¬ 
sual,  auditory,  somatosensory,  etc.).  See  Appendix  B  for  a  full 
listing  of  rule  formulae. 
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These  rules  represent  informed  hypotheses,  based  on  ex¬ 
pert  knowledge.  As  described  below  (Section  5),  bottom- 
up  methods  can  be  used  to  refine  these  rules.  Further,  as 
the  rules  are  applied  to  larger  and  more  diverse  sets  of 
data,  they  are  likely  to  undergo  additional  refinements  (see 
Section  4.1). 

2.2.4.  Automated  labeling  (process  6) 

For  each  condition,  subject,  and  tPCA  factor,  we  used  MAT- 
LAB  to  compute  temporal  and  spatial  metrics  on  that  fac¬ 
tor’s  contribution  to  the  scalp  ERP.  The  values  of  the  met¬ 
rics  specified  in  the  expert  defined  rules  were  then  com¬ 
pared  to  rule-specific  thresholds  that  characterized  specific 
ERP  components.  Thresholds  were  determined  through  ex¬ 
pert  definitions  that  were  formulated  and  tested  as  de¬ 
scribed  in  Section  2.2.3).  The  results  of  the  comparisons  were 
recorded  in  a  true/false  table,  and  factors  meeting  all  crite¬ 
ria  were  flagged  as  capturing  the  specified  ERP  component 
for  that  subject  and  condition.  All  data  were  automatically 
saved  to  Excel  spreadsheets  organized  by  rule,  condition,  and 
subject. 

2.3.  Data  mining 

As  described  in  Section  2.1,  ERP  patterns  are  typically  dis¬ 
covered  through  a  “manual”  process  that  involves  visual  in¬ 
spection  of  spatiotemporal  patterns  and  statistical  analysis  to 
determine  how  the  patterns  differ  across  experiment  condi¬ 
tions.  While  this  method  can  lead  to  consensus  on  the  high- 
level  rules  and  concepts  that  characterize  ERP  patterns  in 
a  given  domain,  operationalization  of  these  rules  and  con¬ 
cepts  is  highly  variable  across  research  labs,  as  described  in 
Section  1.  Bottom-up  (data-driven)  methods  can  contribute 
to  standardization  of  rules  for  classifying  known  patterns, 
and  possibly  to  discovery  of  new  patterns,  as  well.  Here 
we  describe  two  bottom-up  methods,  unsupervised  learning 
(i.e.,  clustering)  and  supervised  learning  (i.e.,  decision  tree 
classifiers). 


lects  the  number  of  clusters  by  maximizing  the  logarithm  of 
the  likelihood  of  future  data.  Observations  that  belong  to  the 
same  pattern  type  should  ideally  be  assigned  to  a  single  clus¬ 
ter. 

2.3.2.  Classification  (process  8) 

We  use  a  traditional  classification  technique,  called  a  deci¬ 
sion  tree  learner.  Each  internal  node  of  a  decision  tree  rep¬ 
resents  an  attribute,  and  each  leaf  node  represents  a  class  la¬ 
bel.  We  used  J48  in  WEKA,  which  is  an  implementation  of 
C4.5  algorithm  [27].  The  input  to  the  decision  tree  learner 
for  the  present  study  consisted  of  a  pattern  factor  metrics 
vector  of  dimension  32,  representing  the  32  statistical  met¬ 
rics  (Appendix  C).  Cluster  labels  were  used  as  classification 
labels.  The  labeled  data  set  was  recursively  partitioned  into 
small  subsets  as  the  tree  was  being  built.  If  the  data  instances 
in  the  same  subset  were  assigned  to  the  same  label  (class), 
the  tree  building  process  was  terminated.  We  then  derived 
If-Then  rules  from  the  resulting  decision  tree  and  compared 
them  with  expert- generated  rules. 

3.  APPLICATION:  VISUAL  WORD  PROCESSING 

The  ERP  data  for  this  study  consisted  of  144  observations  (36 
subjects  X4  experiment  conditions)  that  were  acquired  in  a 
lexical  decision  task  (see  [28]  for  details).  Participants  viewed 
word  and  pseudoword  stimuli  that  were  presented,  one  stim¬ 
ulus  at  a  time,  in  the  center  of  a  computer  monitor  and  made 
word/nonword  judgments  to  each  stimulus  using  their  right 
index  and  middle  fingers  to  depress  the  “1”  and  “2”  keys  on  a 
keyboard  (“yes”  key  counterbalanced  across  subjects).  Stim¬ 
uli  consisted  of  350  words  and  word-like  stimuli,  including 
low-frequency  words  that  were  familiar  to  subjects  (based  on 
pretesting)  and  rare  words  like  “nutant”  (which  were  unlikely 
to  be  known  by  participants).  Letters  were  lower-case  Geneva 
black,  26  dpi,  presented  foveally  on  a  white  screen.  Words  and 
nonwords  were  matched  in  mean  length  and  orthographic 
neighborhood  [29,  30]. 


2.3.  7.  Clustering  (process  7) 


In  this  study,  we  used  the  expectation-maximization  (EM)  al¬ 
gorithm  for  clustering  [25],  as  implemented  in  WEKA  [26]. 
EM  is  used  to  approximate  distributions  using  mixture  mod¬ 
els.  It  is  a  procedure  that  iterates  around  the  expectation  (E) 
and  maximization  (M)  steps.  In  the  E-step  for  clustering,  the 
algorithm  calculates  the  posterior  probability,  hip  that  a  sam¬ 
ple  j  belongs  to  a  cluster  Q: 


hij  =  P(Q  |  Dj)  = 


p{Dj  I 

Xm=lp(^j  I 


(2) 


where  7rz-  is  the  weight  for  the  zth  mixture  component,  Dj 
is  the  measurement,  and  6i  is  the  set  of  parameters  for 
each  density  functions.  In  the  M-step,  the  EM  algorithm 
searches  for  optimal  parameters  that  maximize  the  sum  of 
weighted  log-likelihood  probabilities.  EM  automatically  se¬ 


3. 7.  ERP  experiment  data 

ERP  data  were  recorded  using  a  128 -channel  electrode  ar¬ 
ray,  with  vertex  recording  reference  [31].  Data  were  sam¬ 
pled  at  a  rate  of  250  per  second  and  were  amplified  with  a 
0.01  Hz  highpass  filter  (time  constant  ~10  seconds).  The  raw 
EEG  was  segmented  into  1500  milliseconds  epochs,  starting 
500  milliseconds  before  onset  of  the  target  word.  There  were 
four  conditions  of  interest:  correctly  classified,  low-frequency 
words  (Known);  correctly  classified  rare  words  ( RareHits ), 
rare  words  rated  as  nonwords  ( RareMisses );  and  correctly 
classified  nonwords  (Pseudo). 

Segments  were  marked  as  bad  if  they  contained  ocular 
artifacts  (EOG  >70  pV),  or  if  more  than  20%  of  channels 
were  bad  on  a  given  trial.  The  artifact- contaminated  trials 
were  excluded  from  further  analysis. 

Segmented  data  were  averaged  across  trials  (within  sub¬ 
jects  and  within  conditions)  and  digitally  filtered  with  a  30- 
Hz  lowpass  filter.  After  further  channel  and  subject  exclusion, 
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bad  (excluded)  channels  were  interpolated.  The  data  re¬ 
referenced  to  the  average  of  the  recording  sites  [32],  using 
a  polar  average  reference  to  correct  for  denser  sampling  over 
superior,  as  compared  with  inferior,  scalp  locations  [33,  34]. 
Data  were  averaged  across  individual  subjects,  and  the  result¬ 
ing  “grand- averaged”  ERPs  were  used  for  inspection  of  wave¬ 
forms  and  topographic  plots. 

4.  TPCA  AUTOLABELING  RESULTS 

Temporal  PCA  (tPCA)  was  used  to  transform  the  ERP  data 
into  a  set  of  latent  temporal  patterns  (see  Section  2.2.1  for 
details).  We  extracted  the  first  15  latent  factors  from  each  of 
the  four  datasets,  accounting  for  approximately  80%  of  the 
total  variance.  These  15  tPCA  factors  were  then  subjected  to 
a  Promax  rotation. 

After  the  tPCA  factors  were  projected  back  into  the 
original  data  space  (Section  2.2.1),  we  applied  our  expert- 
defined  rules  to  determine  the  percentage  of  observations 
that  matched  each  target  pattern.  Results  are  shown  in 
Table  2. 

We  assigned  labels  to  the  first  10  factors  based  on  the 
correspondence  between  the  target  patterns  and  the  tPCA 
factors.  Results  were  as  follows:  Factor  4  =  P100,  Factor  3  = 
N100,  Factor  =  N2,  Factor  7  =  N3/Plr,  Factor  2  =  MFN/N4, 
and  Factor  9  =  P3.  Figure  3  displays  the  time  course  and  to¬ 
pography  for  these  si x  pattern  factors. 

Note  that  many  patterns  showed  splitting  across  two  or 
more  factors.  This  may  reflect  misallocation  of  pattern  vari¬ 
ance  across  the  factors  (i.e.,  inaccuracies  in  the  tPCA  decom¬ 
position),  inaccuracies  in  rule  definitions,  or  both.  A  com¬ 
plementary  problem  is  seen  in  the  case  of  factors  2,  7,  and  10, 
which  show  matches  to  more  than  one  target  pattern.  Again, 
this  may  reflect  misallocation  of  variance.  Alternatively,  these 
results  may  suggest  a  need  to  refine  our  pattern  descriptions, 
the  rules  that  are  used  to  identify  pattern  instances,  or  both. 
In  either  case,  these  findings  point  to  the  need  for  systematic 
evaluation  of  results.  Diagnosing  potential  sources  of  error  is 
the  first  step  towards  systematic  improvements  of  methods. 

4. 7.  Evaluation  of  top-down  methods 

In  our  framework,  top-down  methods  for  pattern  classifica¬ 
tion  are  dependent  on  the  accuracy  of  both  the  data  sum¬ 
mary  methods  and  the  expert- defined  rules.  In  particular, 

(1)  data  summary  methods  should  yield  discrete  patterns 
that  reflect  different  underlying  neuropsychological 
processes,  or  “components;” 

(2)  rules  that  are  applied  to  summary  metrics  should  be 
implemented  in  a  way  that  effectively  discriminates  be¬ 
tween  separate  patterns. 

Our  initial  efforts  have  led  to  encouraging  classification  re¬ 
sults,  as  illustrated  above.  However,  several  findings  suggest 
the  need  to  consider  possible  misallocation  of  variance  in  the 
data  summary  process,  and  ways  of  optimizing  pattern  rules. 


4. 1. 1.  Diagnosing  misallocation  of  variance 

A  well-known  critique  of  PCA  methods,  including  tempo¬ 
ral  PCA,  is  that  inaccuracies  in  the  decomposition  can  lead 
to  misallocation  of  variance  ([21,  35]).  For  example,  in  our 
results,  the  left  temporal  N3  and  parietal  Plr  patterns  were 
both  assigned  to  a  single  factor  (cf.  [15]  for  similar  results). 
Recent  methods  can  achieve  separation  of  patterns  that  have 
been  confounded  in  an  initial  PCA  (see  [19]  for  a  discus¬ 
sion).  A  more  serious  problem  is  that  of  the  pattern  split¬ 
ting:  well-known  patterns  like  the  PI 00  are  expected  to  map 
to  a  single  rule  (factor).  Indeed,  this  simple  mapping  was 
obtained  in  3  or  our  4  pilot  datasets  [23].  Splitting  of  the 
PI 00  across  two  factors  therefore  suggests  a  possible  misal¬ 
location  of  variance  in  the  tPCA.  A  future  challenge  will  be 
to  develop  rigorous  methods  of  diagnosing  misallocation  of 
variance  in  the  decomposition  of  ERPs.  In  the  final  section, 
we  consider  alternatives  to  tPCA,  which  may  address  this 
issue. 

4. 1.2.  Comparison  with  a  "gold  standard" 

The  validity  of  our  tPCA  autolabeling  procedures  was  as¬ 
sessed  by  comparing  autolabeling  results  with  a  “gold  stan¬ 
dard,”  which  was  developed  through  manual  labeling  of  pat¬ 
terns.  Two  ERP  analysts  visually  inspected  the  raw  ERPs  for 
each  subject  and  each  condition.  For  each  target  pattern,  the 
analysts  indicated  whether  the  pattern  was  present,  based 
on  inspection  of  temporal  data  (waveforms,  butterfly  plots) 
and  spatial  data  (topography  at  time  of  peak  activity  in  pat¬ 
tern  interval).  Analysts  also  provided  confidence  ratings  and 
rated  the  typicality  of  each  pattern  instance  using  a  3-point 
scale. 

An  initial  set  of  ratings  on  100  observations  (25  subjects 
x4  conditions)  was  collected.  Raters  met  to  discuss  results 
and  to  calibrate  procedures  for  subsequent  ratings.  Experts 
then  proceeded  to  label  another  116  ERP  observations  (4  ob¬ 
servations  were  omitted  due  to  a  technical  error  in  the  data 
file).  This  set  of  labeled  data  constituted  the  “gold  standard” 
for  system  evaluation. 

Interrater  reliability  for  test  data  was  computed  for  two 
of  the  patterns  (P100  and  N100)  using  the  Spearman-Brown 
prophecy  coefficient  [36].  Results  are  graphed  in  Table  3  (“*” 
=  moderate  reliability,  “**”  =  high  reliability). 

For  both  patterns,  the  highest  level  of  reliability  was  re¬ 
flected  in  the  typicality  ratings.  In  addition,  reliability  was 
considerably  higher  for  the  PI 00  pattern.  Inspection  of  the 
data  revealed  that  the  low  reliability  for  N100  “presence” 
judgments  was  due  to  a  systematic  difference  in  use  of  cat¬ 
egories:  one  rater  consistently  rated  as  “not  present”  cases 
where  the  other  rater  indicated  the  pattern  was  “present”  but 
atypical  (“1”  on  typicality  scale). 

Accuracy  of  the  autolabeling  procedures  was  defined 
as  the  percentage  of  system  labels  that  matched  the  gold- 
standard  labels  (%Agr;  see  Table  4).  Across  the  eight  patterns, 
the  autolabeling  results  and  expert  ratings  had  an  averaged 
Pearson  r  correlation  of  +.36.  This  leads  to  an  effective  inter¬ 
rater  reliability  of  +.52  as  measured  by  the  Spearman- Brown 
formula.  Note  that  while  the  %Agr  was  relatively  high  for  the 
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Table  2:  Percentage  of  ERP  observations  for  each  factor  that  matched  expert- defined  rule  criteria. 


%  Observations  meeting  pattern  criteria 


Factor 

P100 

N100 

N2 

N3 

Plr 

MFN 

N4 

N3 

Fac#01 

Fac#02 

— 

— 

— 

— 

— 

36.81 

9.72 

59.72 

Fac#03 

— 

82.64 

— 

— 

— 

— 

— 

— 

Fac#04 

82.64 

— 

— 

— 

— 

— 

— 

— 

Fac#05 

— 

— 

— 

— 

— 

— 

— 

— 

Fac#06 

— 

— 

— 

— 

— 

— 

— 

— 

Fac#07 

— 

— 

69.44 

42.36 

64.58 

22.92 

— 

— 

Fac#08 

34.72 

— 

— 

— 

— 

— 

— 

— 

Fac#09 

— 

— 

— 

— 

— 

— 

— 

56.94 

Fac#10 

— 

51.39 

51.39 

— 

— 

— 

— 

— 

Fac#ll 

— 

— 

— 

47.92 

25.69 

34.03 

35.42 

— 

Fac#12 

— 

— 

— 

— 

— 

— 

— 

— 

Fac#13 

— 

— 

— 

59.03 

62.50 

40.97 

— 

— 

Fac#14 

— 

— 

— 

— 

— 

— 

— 

— 

Fac#15 

— 

— 

— 

— 

— 

— 

— 

9.72 

Pattern 

FacOl 

Fac02 

Fac03 

Fac04 

Fac05 

Fac06 

Fac07 

PI 

- 

- 

- 

82.64 

- 

- 

- 

N1 

- 

- 

82.64 

- 

- 

- 

- 

N2 

- 

- 

- 

- 

- 

- 

69.44 

N3 

- 

- 

- 

- 

- 

- 

42.36 

Plr 

- 

- 

- 

- 

- 

- 

64.58 

MFN 

- 

36.81 

- 

- 

- 

- 

22.92 

N4 

- 

9.72 

- 

- 

- 

- 

- 

P3 

- 

59.72 

- 

- 

- 

- 

- 

(a) 


(c) 


Figure  3:  Autoclassification  and  labeling  results,  (a)  Percentage  of  observations  matching  rule  criteria  for  each  pattern,  (b)  Topogragraphy 
and  (c)  time  course  of  pattern  factors. 


Table  3:  Interrater  reliability  (Spearman-Brown  r). 


Presence 

Confidence 

Typicality 

P100 

.51* 

.41* 

.72** 

N100 

-.04 

.35* 

.45* 

N100  (0.84),  the  Spearman-Brown  coefficient  was  consider¬ 
ably  lower  (0.41),  consistent  with  the  lower  interrater  relia¬ 
bility  observed  between  ERP  analysts  for  this  pattern. 

5.  DATA  MINING  RESULTS 

Input  to  the  data  mining  (“bottom- up”)  analyses  consisted 
of  32  metrics  for  each  factor,  weighted  across  each  of  the 
144  labeled  observations  (total  N  =  4608).  Pattern  labels 


Table  4:  Comparison  of  autolabeling  with  expert  labels. 


Pattern 

Person  r 

Spearman-Brwon 

%Agr 

P100 

0.60 

0.75 

0.90 

N100 

0.26 

0.41 

0.84 

N2 

0.12 

0.21 

0.53 

N3 

0.41 

0.58 

0.63 

Plr 

0.47 

0.64 

0.76 

MFN 

0.33 

0.49 

0.40 

N4 

0.37 

0.54 

0.81 

P3 

0.30 

0.46 

0.64 

for  each  observation  were  a  combination  of  the  autolabel¬ 
ing  results  (pattern  present  versus  pattern  absent  for  each 
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factor,  for  each  observation),  combined  with  typicality  rat¬ 
ings,  as  follows.  Observations  that  met  the  rule  criteria  (“pat¬ 
tern  present”  according  to  autolabeling  procedures)  and  were 
rated  as  “typical”  (rating  >  “1”)  were  assigned  to  one  cat¬ 
egory  label.  Observations  that  either  failed  to  meet  pattern 
criteria  (“pattern  absent”)  or  were  rated  as  atypical  (“1”  on 
rating  scale),  or  both,  were  assigned  to  a  second  category.  The 
combined  labels  were  used  to  capitalize  on  the  high  reliabil¬ 
ity  and  greater  sensitivity  of  the  typicality  +  presence/absence 
ratings,  as  compared  with  the  presence/ absence  labels  by 
themselves. 

For  the  EM  procedures,  we  set  the  number  of  clusters  to 
be  9  (8  patterns  +  nonpatterns).  We  then  clustered  the  144 
observations  derived  from  the  pattern  factors,  based  on  the 
32  metrics.  As  shown  in  Table  5,  the  assignment  of  obser¬ 
vations  to  each  of  the  9  clusters  largely  agreed  with  the  re¬ 
sults  from  the  top-down  (autolabeling)  procedures  (compare 
Table  2). 

Ideally,  each  cluster  will  correspond  to  a  unique  ERP  pat¬ 
tern.  However,  as  noted  above,  inaccuracies  in  either  the  data 
summary  (tPCA)  procedures,  or  the  expert  rules,  or  both, 
can  lead  to  pattern  splitting.  Thus,  it  is  not  surprising  that 
patterns  in  our  clustering  analysis  were  occasionally  assigned 
to  two  or  more  clusters.  For  instance,  the  P100  pattern  split 
into  two  clusters  (clusters  4  and  5),  consistent  with  the  auto- 
labeling  results  (Table  2). 

Supervised  learning  (decision  tree)  methods  were  used  to 
derive  pattern  rules,  independently  of  expert  judgments.  Ac¬ 
cording  to  the  information  gain  rankings  of  the  32  attributes, 
Tl-max  and  In-mean(ROI)  were  most  important,  consistent 
with  our  previous  results  [22] .  These  findings  validate  the  use 
of  these  two  metrics  in  expert-defined  rules.  Decision  trees 
revealed  the  importance  of  additional  spatial  metrics,  sug¬ 
gesting  the  need  for  finer- grained  characterization  of  pattern 
topographies  in  our  rule  definitions.  In  addition,  difference 
measures  (Pseudo -RareMisses  and  RareMisses-RareHits)  were 
highly  ranked  for  certain  patterns  (the  N2  and  P300,  resp.), 
suggesting  that  functional  metrics  may  be  useful  for  classifi¬ 
cation  of  certain  target  patterns. 

6.  CONCLUSION 

The  goal  of  this  study  was  to  define  high-level  rules  and 
concepts  for  ERP  components  in  a  particular  domain  (vi¬ 
sual  word  recognition)  and  to  design,  evaluate,  and  optimize 
an  automated  data  processing  and  labeling  stream  that  im¬ 
plements  these  rules  and  concepts.  By  combining  rule  def¬ 
initions  based  on  expert  knowledge  (top-down  approach) 
with  rule  definitions  that  are  generated  through  data  mining 
(bottom-up  approach),  we  predicted  that  our  system  would 
achieve  higher  accuracy  than  a  system  based  on  either  ap¬ 
proach  in  isolation.  Results  suggest  that  the  combination 
of  top-down  and  bottom-up  methods  is  indeed  synergistic: 
while  domain  knowledge  was  used  effectively  to  constrain  the 
number  of  clusters  in  the  data  mining,  decision  tree  classi¬ 
fiers  revealed  the  importance  of  additional  metrics,  including 
multiple  measures  of  topography  and,  for  certain  patterns, 
functional  metrics  that  correspond  to  experiment  manipula¬ 
tions. 


Ongoing  work  is  focused  on  the  following  goals: 

(i)  refinement  of  procedures  for  expert  labeling  of  pat¬ 
terns  in  the  “raw”  (untransformed)  ERP  data; 

(ii)  testing  of  alternative  data  summary  and  autolabeling 
methods; 

(iii)  modification  of  rules  and  concepts,  based  on  integra¬ 
tion  of  bottom-up  and  top-down  classification  meth¬ 
ods. 

6. 7 .  Alternative  data  summary  procedures 

In  the  present  study,  we  applied  temporal  PCA  (tPCA)  to  de¬ 
compose  ERP  data  into  discrete  patterns  that  are  input  to 
our  automated  component  classification  and  labeling  pro¬ 
cess.  PCA  is  a  useful  approach  because  it  is  automated,  is 
data-driven,  and  has  been  validated  and  optimized  for  de¬ 
composition  of  event-related  potentials  [21].  At  the  same 
time,  as  illustrated  here,  PCA  is  prone  to  misallocation  of 
variance  across  the  latent  factors.  Further,  differences  in  the 
time  course  of  patterns  across  subjects  and  experiment  con¬ 
ditions  are  a  particular  problem  for  tPCA  methods:  latency 
“jitter”  can  lead  to  mischaracterization  of  patterns  [7]. 

For  this  reason,  we  are  currently  testing  alternative  ap¬ 
proaches  to  ERP  component  analysis.  One  approach  involves 
application  of  sequential  (temporo-spatial)  PCA.  Temporo- 
spatial  PCA  is  a  refinement  and  extension  of  temporal  PCA 
(see  [12,  19]  for  details).  The  factor  scores  from  the  tempo¬ 
ral  PCA,  which  quantify  the  extent  to  which  their  respective 
latent  factors  are  present  in  the  ERP  data,  undergo  a  spatial 
PCA.  The  spatial  PCA  further  decomposes  the  factor  scores 
into  a  second  tier  of  latent  factors  that  capture  correlations 
between  channels  across  subjects  and  conditions.  The  latent 
factors  from  the  two  decompositions  are  then  combined  to 
yield  a  finer  decomposition  of  the  patterns  of  variance  that 
are  present  in  the  ERP  data. 

6.  7.  7.  Windowed  analysis  ofERPs 

The  second  approach  is  to  adopt  the  traditional  methods 
of  parsing  ERP  data  into  discrete  temporal  “windows”  for 
analysis.  By  focusing  on  temporal  windows  corresponding  to 
known  ERP  patterns,  the  algorithms  we  developed  for  ex¬ 
tracting  statistics  from  the  tPCA  factors  can  be  extended  to 
the  raw  ERP,  with  some  modification.  While  the  raw  ERP 
is  more  complex,  with  overlapping  temporo-spatial  patterns, 
the  autolabeling  process  applied  to  raw  ERPs  would  corre¬ 
spond  directly  to  the  expert  “gold  standard”  labeling  proce¬ 
dure.  Furthermore,  it  would  not  be  subject  to  one  weakness 
of  tPCA,  namely,  that  the  time  courses  of  the  factor  loadings 
are  invariant  across  subjects  and  conditions. 

6. 1 .2.  Microstate  analysis 

We  are  also  evaluating  the  use  of  microstate  analysis,  an  ap¬ 
proach  to  ERP  pattern  segmentation  that  was  introduced 
by  Lehmann  and  Skrandies  [37].  Micro  state  analysis  is  a 
data  parsing  technique  that  partitions  the  ERP  into  win¬ 
dows  based  upon  characteristics  of  its  evolving  topography. 
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Table  5:  EM  clustering  results  (NP:  nonpatterns). 

0 

1 

2 

3 

4 

5 

6 

7 

8 

P100 

0 

0 

0 

0 

60 

49 

0 

0 

0 

N100 

1 

0 

0 

0 

0 

0 

7 

30 

77 

N2 

104 

0 

0 

0 

17 

0 

0 

3 

8 

N3 

5 

0 

0 

0 

4 

2 

2 

40 

1 

Plr 

11 

0 

14 

0 

14 

6 

5 

51 

0 

MFN 

0 

0 

0 

56 

0 

9 

0 

0 

0 

N4 

0 

0 

0 

15 

0 

1 

0 

0 

0 

P3 

0 

113 

0 

2 

0 

0 

0 

0 

0 

NP 

26 

28 

22 

197 

39 

16 

33 

64 

20 

Consecutive  time  slices,  whose  topographies  are  similar  un¬ 
der  a  metric,  such  as  global  map  similarity,  are  grouped 
together  into  a  single  microstate.  This  microstate  in  turn 
corresponds  to  a  distinct  distribution  of  neuronal  activity. 
Microstate  analysis  may  hold  promise  for  separating  ERP 
components  that  have  minimal  temporal  overlap.  Moreover, 
this  method  has  been  implemented  as  a  fully  automated 
process  (see  [38]  for  downloadable  software  and  [39,  40] 
for  discussion  of  automated  segmentation  using  microstate 
analysis). 


6.2.  Development  of  neural  electromagnetic 
ontologies  (NEMO) 

In  previous  work  [22]  we  have  described  progress  on  the  de¬ 
sign  of  a  domain  ontology  mining  framework  and  its  ap¬ 
plication  to  EEG  data  and  patterns.  This  represents  a  first 
step  in  the  development  of  Neural  ElectroMagnetic  Ontolo¬ 
gies  (NEMO).  The  tools  that  are  developed  for  the  NEMO 
project  can  be  used  to  support  data  management  and  pattern 
analysis  within  individual  research  labs.  Beyond  this  goal, 
ontology-based  data  sharing  can  support  collaborative  re¬ 
search  that  would  advance  the  state  of  the  art  in  EM  brain 
imaging,  by  allowing  for  large-scale  metaanalysis  and  high- 
level  integration  of  patterns  across  experiments  and  imag¬ 
ing  modalities.  Given  that  researchers  currently  use  different 
concepts  to  describe  temporal  and  spatial  data,  ontology  de¬ 
velopment  will  require  us  to  develop  a  common  framework 
to  support  spatial  and  temporal  references. 

A  practical  goal  for  the  NEMO  project  is  to  build  a 
merged  ERP-ERF  ontology  for  the  reading  and  language  do¬ 
main.  This  accomplishment  would  demonstrate  the  utility  of 
ontology-based  integration  of  averaged  EEG  and  MEG  mea¬ 
sures,  and  make  strong  contributions  to  the  advancement  of 
multimodal  neuroinformatics.  To  accomplish  this  goal,  we 
have  developed  concurrent  strategies  for  representation  of 
ERP  and  ERF  data  in  sensor  space  and  in  source  (anatom¬ 
ical)  space.  To  link  to  these  ontology  databases  and  to  sup¬ 
port  integration  of  EM  measures  with  results  from  other 
neuroimaging  techniques,  we  are  working  to  extend  our  pat¬ 
tern  classification  process  to  brain-based  coordinate  systems, 
through  application  of  source  analysis  to  dense-array  EEG 
and  whole-head  MEG  datasets. 


APPENDICES 

A.  CHANNEL  GROUPINGS  FOR  SPATIAL  METRICS 
(REGIONS  OF  INTEREST— ROIS) 


Left  occipital 

77,  78,  83,  84,  85,  86, 
89,  90,91,92,  95,  96 

Right  occipital 

59,  60,  64,  65,  66,  67, 
69,  70,  71,  72,  74,  75 

Left 

anterotemporal 

27,  28,  33,  34,  35,  39, 
40,  41,  44,  45,  46,  49, 
128 

Right 

anterotemporal 

1,2,  109,  110,  114, 
115,  116,  117,  120, 
121,  122,  123,  125 

Left 

posterotemporal 

50.  56,  57,  58,  63,  64 
65,  69 

Right 

posterotemporal 

91,96,  97,  100,  101, 
102,  108 

•  .  •  - 


Medial  frontal 


5,  6,  7,  12,  13,21 
107,  113,  119 


Left  parietal 

7,31,32,  37,  38,  42, 
43,  48,  52,  53,  54,  60, 
61,67 

Right  parietal 

78,  79,  80,81,86,  87, 
88,  93,  94,  99,  104, 
105,  106,  107 

B.  ERP  PATTERN  RULES  HYPOTHESIZED  FOR 
VISUAL  WORD  RECOGNITION 

Rule#1  (pattern  PT\  =  PI 00) 

LetROI  =  occipital  (average  of  left  and  right  occipital).  For  any 
w,  FA„  =  VTi  iff 

(i)  60  ms  <  Tl-max  (FA„)  <  150  ms  AND 

(ii)  IIN-mean(ROI)  |  >  .4  mV  AND 

(iii)  IN-mean(ROI)  >  0. 
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Table  6 

Metric 

Description 

Function 

Pseudo-known 

RareMisses-RareHits 

RareHits- Known 

Pseudo-RareMisses 

Difference  in  mean  intensity  over  ROI  at  time  of  peak  latency  (Nonwords -Words) 

Difference  in  mean  intensity  over  ROI  at  time  of  peak  latency  (RareMisses-RareHits) 

Difference  in  mean  intensity  over  ROI  at  time  of  peak  latency  (RareHits-Known) 

Difference  in  mean  intensity  over  ROI  at  time  of  peak  latency  (Nonwords-RareMisses) 

Intensity 

IN- max 

IN-max  to  Baseline 

IN- min 

IN-min  to  Baseline 

SP-max 

SP-max  ROI 

SP-min 

SP-min  ROI 

Maximum  intensity  (in  microvolts)  at  time  of  peak  latency 

Maximum  intensity  (in  microvolts)  at  time  of  peak  latency  with  respect  to  intensity  at  Tl-begin 
Maximum  intensity  (in  microvolts)  at  time  of  peak  latency 

Maximum  intensity  (in  microvolts)  at  time  of  peak  latency  with  respect  to  intensity  at  Tl-begin 
Channel  associated  with  maximum  intensity,  IN-max 

Channel  group  (ROI)  containing  SP-max 

Channel  associated  with  manimum  intensity,  IN-min 

Channel  group  (ROI)  containing  SP-min 

Space 

IN-mean  ROI 

IN-LOCC 

IN-ROCC 

IN-LPAR 

IN-RPAR 

IN-LPTEM 

IN-RPTEM 

IN-LATEM 

IN-RATEM 

IN-LORB 

IN-RORB 

IN-LFRON 

IN-RFRON 

SP-cor 

Mean  intensity  (in  microvolts)  at  time  of  peak  latency  for  a  specified  channel  group 

Mean  intensity  (in  microvolts)  at  time  of  peak  latency  for  left  occipital  channel  group 

Mean  intensity  (in  microvolts)  at  time  of  peak  latency  for  right  occipital  channel  group 

Mean  intensity  (in  microvolts)  at  time  of  peak  latency  for  left  parietal  channel  group 

Mean  intensity  (in  microvolts)  at  time  of  peak  latency  for  right  parietal  channel  group 

Mean  intensity  (in  microvolts)  at  time  of  peak  latency  for  left  posterior  temporal  channel  group 
Mean  intensity  (in  microvolts)  at  time  of  peak  latency  for  right  posterior  temporal  channel 
group 

Mean  intensity  (in  microvolts)  at  time  of  peak  latency  for  left  anterior  temporal  channel  group 
Mean  intensity  (in  microvolts)  at  time  of  peak  latency  for  right  anterior  temporal  channel  group 
Mean  intensity  (in  microvolts)  at  time  of  peak  latency  for  left  orbital  channel  group 

Mean  intensity  (in  microvolts)  at  time  of  peak  latency  for  right  orbital  channel  group 

Mean  intensity  (in  microvolts)  at  time  of  peak  latency  for  left  frontal  channel  group 

Mean  intensity  (in  microvolts)  at  time  of  peak  latency  for  right  frontal  channel  group 
Correlation  between  factor  topography  and  topography  of  target  pattern 

Time 

Tl-max 

Tl-begin 

Tl-end 

TI- duration 

Latency  (in  milliseconds)  of  maximum  or  minimum  amplitude 

Onset  (in  milliseconds)  of  waveform  excurstion  containing  peak  intensity 

Conclusion  (in  milliseconds)  of  waveform  excurstion  containing  peak  intensity 

Duration  (in  milliseconds)  of  pattern,  equal  to  Tl-begin  minus  Tl-end 

Rule  #2  (pattern  PT2  =  N100) 

LetROI  =  occipital  (average  of  left  and  right  occipital).  For  any 
n,  FAn  =  PT2  iff 

(i)  151  ms  <  Tl-max  (FA„)  <  229  ms  AND 

(ii)  |IN-mean(ROI)|  >.4  mV  AND 

(iii)  IN-mean(ROI)  <  0. 

Rule  #3  (pattern  PT3  =  N2) 

Let  ROI  =  occipital-temporal  (average  of  occipital,  posterior 
temporal).  For  any  n,  FA„  =  PT3  iff 

(i)  230  ms  <  Tl-max  (FA„)  <  300  ms  AND 

(ii)  |IN-mean(ROI)|  >.4  mV  AND 

(iii)  IN-mean(ROI)  <  0. 


Rule  #4  (pattern  PT4  =  N3) 

LetROI  =  left  anterior  temporal.  For  any  n,  FAn  =  PT4  iff 

(i)  250  ms  <  Tl-max  (FA„)  <  400  ms  AND 

(ii)  |IN-mean(ROI)|  >  .4  mV  AND 

(iii)  IN-mean(ROI)  <  0. 

Rule #5  (pattern  PT5  =  PI r) 

Let  ROI  =  parietal  temporal  (average  of  left  parietal,  right  pari¬ 
etal)  For  any  n,  FA„  =  PT5  iff 

(i)  250  ms  >  Tl-max  (FA„)<  400  ms  AND 

(ii)  |IN-mean(ROI)|  >  .4  mV  AND 

(iii)  IN-mean(ROI)  >  0. 
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Rule  #6  (pattern  PT6  =  MFN) 

Let  ROI  =  frontocentral  (average  of  left  fronto central,  right 
fr onto  central)  For  any  n,  FA„  =  PT6  iff 

(i)  250  ms  <  Tl-max  (FA„)  <  450  ms  AND 

(ii)  |IN-mean(ROI)|  >  .4  mV  AND 

(iii)  IN-mean(ROI)  <  0. 

Rule  #7  (pattern  PT7  =  N4) 

Let  ROI  =  parietal  temporal  (average  of  left  parietal,  right  pari¬ 
etal)  For  any  n,  FA„  =  PT7  iff 

(i)  350  ms  <  Tl-max  (FA„)  <  550  ms  AND 

(ii)  |IN-mean(ROI)|  >.4  mV  AND 

(iii)  IN-mean(ROI)  <  0. 

Rule  #8  (pattern  PT8  =  P300) 

Let  ROI  =  parietal  temporal  (average  of  left  parietal,  right  pari¬ 
etal)  For  any  n,  FA„  =  PT8  iff 

(i)  401  ms  >  Tl-max  (FA„)  <  700  ms  AND 

(ii)  |IN-mean(ROI)|  >.4  mV  AND 

(iii)  IN-mean(ROI)  >  0. 

C.  STATISTICAL  METRICS 

For  statistical  metrics  see  Table  6. 
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